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IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



In re U.S. National Phase of 
PCT/US99/22259 of: 

JEAN-MARC JOT, et al. 

Application No.: Not yet assigned 

Filed: Herewith 

For: METHOD AND APPARATUS FOR 
THREE-DIMENSIONAL AUDIO 
DISPLAY 



PRELIMINARY AMENDMENT 



San Francisco, CA 941 1 1 
March 26, 2001 

Assistant Commissioner for Patents 
Washington, D.C. 20231 

Sir: 

Simultaneously with the filing of this application, please amend it as indicated 
below. Marked-up versions of the changes to the claims are attached to this Preliminary 
Amendment. 

IN THE CLAIMS : 

Please substitute the following amended, clean versions of the indicated claims: 

36. (amended) The method according to claim 34 wherein one or more 
of the spatial functions have their principal direction corresponding to the direction of one of 
the loudspeakers. 

37. (amended) The method according to claim 33 including performing 
cross-talk cancellation of the left and right audio signals before feeding the loudspeakers. 

38. (amended) The method of claim 34 further including: 

producing left-front and left-back signals based on the left-channel audio signal; 
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producing right-front and right-back signals based on the right-channel audio 

signal; and 

combining the left-front, left-back, right-front, and right-back signals to produce 
outputs suitablefor playback with a pair of front speakers and a pair of rear speakers. 



REMARKS : 

Claims 1-49 are pending. 

Amendment is made to eliminate all multiple dependencies from the claims, 
thereby avoiding the need to pay the multiple dependent surcharge. 

Respectfully submitted, 
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San Francisco, California 941 1 1-3834 

Tel: (415) 576-0200 

Fax: (415) 576-0300 
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MARKED-UP VERSION OF THE CHANGES TO THE CLAIMS 

IN THE CLAIMS : 

36. (amended) The method according to [claims 34 or 35] claim 34 
wherein one or more of the spatial functions have their principal direction corresponding to the 
direction of one of the loudspeakers. 



37. (amended) The method according to [claims 33 or 36] claim 33 
including performing cross-talk cancellation of the left and right audio signals before feeding 
the loudspeakers. 



38. (amended) The method of [claims 34 or 35] claim 34 further 

including: 

producing left-front and left-back signals based on the left-channel audio signal; 
producing right-front and right-back signals based on the right-channel audio 

signal; and 

combining the left-front, left-back, right-front, and right-back signals to produce 
outputs suitablefor playback with a pair of front speakers and a pair of rear speakers. 



WO 00/19415 



89/8 06193 
Rec'dPgWPTO 26 MAR 2001 

PCT/US99/22259 



METHQp^AND_APP ARATUS FOR 
TH REE-DIMENSIONAL AUDIO D ISPLAY 

FIELD OF THE INVENTION 
The present invention relates generally to audio recording, and more specifically to 
the mixing, recording and playback of audio signals for reproducing real or virtual 
three-dimensional sound scenes at the eardrums of a listener using loudspeakers or 
headphones. 

BACKGROUND 

A well-known technique for artificially positioning a sound in a multi-channel 
loudspeaker playback system consists of weighting an audio signal by a set of 
amplifiers feeding each loudspeaker individually. This method, described e. g. in 
[Chowning71], is often referred to as "discrete amplitude panning" when only the 
loudspeakers closest to the target direction are assigned non-zero weights, as 
illustrated by the graph of panning functions in Fig. 1 . Although Fig. 1 shows a two- 
dimensional loudspeaker layout, the method can be extended with no difficulty to 
three-dimensional loudspeaker layouts, as described e. g. in [Pulkki97]. A drawback 
of this technique is that it requires a high number of channels to provide a faithful 
reproduction of all directions. Another drawback is that the geometrical layout of the 
loudspeakers must be known at the encoding and mixing stage. 

An alternative approach, described in [Gerzon85], consists of producing a 'B-Format' 
multi-channel signal and reproducing this signal over loudspeakers via an 
' Ambisonic' decoder, as illustrated in Fig. 2. Instead of discrete panning functions, the 
B Format uses real-valued spherical harmonics. The zero-order spherical harmonic 
function is named W, while the three first-order harmonics are denoted X, Y, and Z. 
These functions are defined as follows: 

W{6, p) = 1 

X(6, (£) = cos(#;) cos(6) 
Y{6, <p) = cos(^) sin(<S) 
Z{6, <p) = sin(» 
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where 6 and (p denote respectively the azimuth and elevation angles of the sound 
source with respect to the listener, expressed in radians. An advantage of this 
technique over the discrete panning method is that B Format encoding does not 
require knowledge of the loudspeaker layout, which is taken into account in the design 
of the decoder. A second advantage is that a real-world B-Format recording can be 
produced with practical microphone technology, known as the 'Soundfield 
Microphone' [Farrah79]. As illustrated in Fig. 2, this allows for combining 
microphone-encoded sounds with electronically encoded sounds to produce a single 
B-format recording. First-order Ambisonic decoders do not reconstruct the acoustic 
pressure information at the ears of the listener except at low frequencies (below about 
700 Hz). As described e. g. in [Bamford95], the frequency range can be extended by 
increasing the order of spherical harmonics, but only at the expense of a higher 
number of encoding channels and loudspeakers. 

3-D audio reproduction techniques which specifically aim at reproducing the acoustic 
pressure at the two ears of a listener are usually termed binaural techniques. This 
approach is illustrated in Fig. 3 and reviewed e. g. in [Jot95]. A binaural recording can 
be produced by inserting miniature microphones in the ear canals of an individual or 
dummy head. Binaural encoding of an audio signal (also called binaural synthesis) 
can be performed by applying to a sound signal a pair of left and right filters modeling 
the head-related transfer functions (HRTFs) measured on an individual or a dummy 
head for a given direction. As shown in Fig. 3, a HRTF can be modeled as a cascaded 
combination of a delaying element and a minimum-phase filter, for each of the left 
and right channels. A binaurally encoded or recorded signal is suitable for playback 
over headphones. For playback over loudspeakers, a cross-talk canceller is used, as 
described e. g. in [Gardner97], 

Conventional binaural techniques can provide a more convincing 3-D audio 
reproduction, over headphones or loudspeakers, than the previously described 
techniques. However, they are not without their own drawbacks and difficulties. 

- Compared to discrete amplitude panning or B-Format encoding, binaural synthesis 
involves a significantly larger amount of computation for each sound source. An 
accurate finite impulse response (FIR) model of an HRTF typically requires a 1- 
ms long response, i. e. approximately 100 additions and multiplies per sample 
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period at a sample rate of 48 kHz, which amounts to 5 MIPS (miliion instructions 
per second). 

- The HRTF can only be measured at a set of discrete positions around the head. 
Designing a binaural synthesis system which can faithfully reproduce any 
direction and smooth dynamic movements of sounds is a challenging problem 
involving interpolation techniques and time-variant filters, implying an additional 
computational effort. 

- The binaurally recorded or encoded signal contains features related to the 
morphology of the torso, head, and pinnae. Therefore the fidelity of the 
reproduction is compromised if the listener's head is not identical to the head used 
in the recording or the HRTF measurements. In headphone playback, this can 
cause artifacts such as an artificial elevation of the sound, front-back confusions or 
inside-the-head localization. 

- In reproduction over two loudspeakers, the listener must be located at a specific 
position for lateral sound locations to be convincingly reproduced (beyond the 
azimuth of the loudspeakers), while rear or elevated sound locations cannot be 
reproduced reliably. 

[Travis96] describes a method for reducing the computational cost of the binaural 
synthesis and addresses the interpolation and dynamic issues. This method consists of 
combining a panning technique designed for N-channel loudspeaker playback and a 
set of N static binaural synthesis filter pairs to simulate N fixed directions (or "virtual 
loudspeakers") for playback over headphones. This technique leads to the topology of 
Fig. 4a, where a bank of binaural synthesis filters is applied after panning and mixing 
of the source signals. An alternative approach, described in [Gehring96], consists of 
applying the binaural synthesis filters before panning and mixing, as illustrated in Fig. 
4b. The filtered signals can be produced off-line and stored so that only the panning 
and mixing computations need to be performed in real time. In terms of reproduction 
fidelity, these two approaches are equivalent. Both suffer from the inherent 
limitations of the multi-channel positioning techniques. Namely, they require a large 
number of encoding channels to faithfully reproduce the localization and timbre of 
sound signals in any direction. 
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[Lowe95] describes a variation of the topology of Fig. 4a, in which the directional 
encoder generates a set of two-channel (left and right) audio signals, with a direction- 
dependent time delay introduced between the left and right channels, and each two- 
channel signal is panned between front, back and side "azimuth placement" 
filters. [Chen96] uses an analysis method known as principal component analysis 
(PCA) to model any set of HRTFs as a weighted sum of frequency-dependent 
functions weighted by functions of direction. The two sets of functions are listener- 
specific (uniquely associated to the head on which the HRTF were measured) and can 
be used to model the left filter and the right filter applied to the source signal in the 
directional encoder. [Abel97] also shows the topologies of Figs. 4a and 4b and uses a 
singular value decomposition (SVD) technique to model a set of HRTFs in a manner 
essentially equivalent to the method described in [Chen96], resulting in the 
simultaneous solution for a set of filters and the directional panning functions. 

There remains a need for a computationally efficient technique for high-fidelity 3-D 
audio encoding and mixing of multiple audio signals. It is desirable to provide an 
encoding technique that produces a non listener-specific format. There is a need for a 
practical recording technique and suitably designed decoders to provide faithful 
reproduction of the pressure signals at the ears of a listener,' over headphones or two- 
channel and multi-channel loudspeaker playback systems. 

SUMMARY OF THE INVENTION 
A method for positioning an audio signal includes selecting a set of spatial functions 
and providing a set of amplifiers. The gains of the amplifiers being dependent on 
scaling factors associated with the spatial functions. An audio signal is received and a 
direction for the audio signal is determined. The scaling factors are adjusted 
depending on the direction. The amplifiers are applied to the audio signal to produce 
first encoded signals. The audio signal is then delayed. The second filters are then 
applied to the delayed signal to produce second encoded signals. The resulting 
encoded signals contain directional information. In one embodiment of the invention, 
the spatial functions are the spherical harmonic functions. The spherical harmonics 
may include zero-order and first-order harmonics and higher order harmonics. In 
another embodiment, the spatial functions include discrete panning functions. 
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Further in accordance with the method of the invention, a decoding of the 
directionally encoded audio includes providing a set of filters. The filters are defined 
based on the selected spatial functions. 

An audio recording apparatus includes first and second multiplier circuits having 
adjustable gains. A source of an audio signal is provided, the audio signal having a 
time-varying direction associated therewith. The gains are adjusted based on the 
direction for the audio. A delay element inserts a delay into the audio signal. The 
audio and delayed audio are processed by the multiplier circuits, thereby creating 
directionally encoded signals. In one embodiment, an audio recording system 
comprises a pair of soundfield microphones for recording an audio source. The 
soundfield microphones are spaced apart at the positions of the ears of a notional 
listener. 

According to the invention, a method for decoding includes deriving a set of spectral 
functions from preselected spatial functions. The resulting spectral functions are the 
basis for digital filters which comprise the decoder. 

According to the invention, a decoder is provided comprising digital filters. The 
filters are defined based on the spatial functions selected for the encoding of the audio 
signal. The filters are arranged to produce output signals suitable for feeding into 
loudspeakers. 

The present invention provides an efficient method for 3-D audio encoding and 
playback of multiple sound sources based on the linear decomposition of HRTF using 
spatial panning functions and spectral functions, which 

guarantees accurate reproduction of ITD cues for all sources over the whole 
frequency range 

uses predetermined panning functions. 

The use of predetermined panning functions offers the following advantages over 
methods of the prior art which use principal components analysis or singular value 
decomposition to determine panning functions and spectral functions: 
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efficient implementation in hardware or software 

non-individual encoding/recording format 

adaptation of the decoder to the listener 

improved multi-channel loudspeaker playback 

Two particularly advantageous choices for the panning functions are detailed, offering 
additional benefits: 

Spherical harmonics 

allow to make recordings using available microphone technology (a pair of 
Soundfield microphones) 

yield a recording format that is a superset of the B format standard 
associated to a special decoding technique for multi-channel loudspeaker playback 
- Discrete panning functions 

guarantees exact reproduction of chosen directions 

increased efficiency of implementation (by minimizing the number of non-zero 
panning weights for each source) 

associated to a special decoding technique for multi-channel loudspeaker playback 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 : Discrete panning over 4 loudspeakers. Example of discrete panning 
functions. 

Figure 2: B-format encoding and recording. Playback over 6 loudspeakers using 
Ambisonic decoding. 

Figure 3: Binaural encoding and recording. Playback over 2 speakers using cross-talk 
cancellation. 

Figure 4: (a) Post- filtering topology, (b) Pre- filtering topology. 
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Figure 5: (a) Post-filtering and (b) pre- filtering topologies, with control of interaural 
time difference for each sound source. 

Fi gure 6: Binaural B Format encoding with decoding for playback over over 
headphones. 

Figure 7: Original and reconstructed HRTF with Binaural B Format (first-order 
reconstruction). 

Figure 8: Binaural B Format reconstruction filters (amplitude frequency response). 

Figure 9: Binaural B Format decoder for playback over 4 speakers. 

Figure 10: Binaural Discrete Panning using 6 encoding channels, with decoder for 
playback over 2 speakers with cross-talk cancellation. 

Figure 1 1 : Binaural Discrete Panning using 6 encoding channels, with decoder for 
playback over 4 speakers with cross-talk cancellation. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
Modeling HRTF using predetermined spatial functions 

Given a set ofN spatial panning functions {g,( 6, <p\ i = 0, 1 , . . . N- 1 } the procedure for 
modeling HRTF according to the present invention is as follows. This procedure is 
associated to the topologies described in Fig. 5a and Fig. 5b for directionnally 
encoding one or several audio signals and decoding them for playback over 
headphones. 

1. Measuring HRTFs for a set of positions {{6 p , <p p ), p = 1 , 2,. . . P} . The sets of left- 
ear and right-ear HRTFs will be denoted, respectively, as: 

{L(6 p , ^,/)}and {R(6 p , <p p ,f)}, forp = 1, 2,... P, where /denotes frequency. 

2. Extracting the left and right delays t L (6 p , <p p ) and t R (6 p , <p p ) for every position. 
Denoting T{6, cp,f) = exp(2rtj / 1(6, p)), the time-delay operator of duration t, 
expressed in the frequency domain, the left-ear and right-ear HRTFs are expressed 
by: 

L(6 p , cp p ,j) = T L (6 p , <p p ,f) L(6 p , ? p ,f>, 

R(6 p , cp p ,J) = T R (6 p , <p p ,f) R(6 p , <f p ,f), for/? = 1, 2,... P. 



7 



WO 00/19415 



PCT/US99/22259 



3. Equalization removing a common transfer function from all HRTFs measured on 
one ear. This transfer function can include the effect of the measuring apparatus, 
loudspeaker, and microphones used. It can also be the delay-free HRTF L (or R) 
measured for one particular direction (free-field equalization), or a transfer 
function representing an average of all the delay-free HRTFs L (or R) measured 
over all positions (diffuse-field equalization). 

4. Symmetrization, whereby the HRTFs and the delays are corrected in order to 
verify the natural left-right symmetry relations: 

R(6, <p,j) = L{2k-6, and t L {6, p) = t R (2n-6, p). 

5. Derivation of the set of reconstruction filters {L,(f)} and (i? ; (/)} satisfying the 
approximate equations: 

L(6 p , <p pi f) = X^cuN-n gk6 p , <P P ) Lff), 

R(6 p , <p p ,j) == Z„ = o.:.n- U &(^ ^) Rtf), forp = 1, 2,... P. 

In practice, the measured HRTFs are obtained in the digital domain. Each HRTF is 
represented as a complex frequency response sampled at a given number of 
frequencies over a limited frequency range, or, equivalently, as a temporal impulse 
response sampled at a given sample rate. The HRTF set {L(6 p , <p p ,J)} or {R(6 p , <p p ,j)} 
is represented, in the above decomposition, as a complex function of frequency in 
which every sample is a function of the spatial variables 6 and <p, and this function is 
represented as a weighted combination of the spatial functions g t {6, q). As a result, a 
sampled complex function of frequency is associated to each spatial function g t (6, q), 
which defines the sampled frequency response of the corresponding filter L t {f) or Rffl. 
It is noted that, due to the linearity of the Fourier transform, an equivalent 
decomposition would be obtained if the the frequency variable/ were replaced by the 
time variable in order to reconstruct the time-domain representation of the HRTF. 

The equalization and the symmetrization of the HRTF sets L(6 p , p p ,f) and R(6 p , p p ,j), 
are not necessary to carrying out the invention. However, performing these operations 
eliminates some of the artifacts associated to the HRTF measurement method. Thus, 
it may be preferable to perform these operations for their practical advantages. 
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Step 2 is optional and is associated to the binaural synthesis topologies described in 
Figs. 5a and 5b, where the delays t L (6, q) and t R (6, q) are introduced in the directional 
encoding module for each sound source. If step 2 is not applied, the binaural synthesis 
topologies of Figs. 4a and 4b can be used. If the delay extraction procedure is 
appropriately performed (as discussed below) the topologies of Figs. 5a and 5b will 
provide a higher fidelity with fewer encoding channels. It will be noted that adding or 
subtracting a common delay offset to t L {6, q) and t R (6, q) in the encoding module will 
have no effect over the perceived direction of sounds during playback, even if the 
delay offset varies with direction, as long as the interaural time delay difference 
(ITD), defined below, is preserved for each direction. 

ITD(6,q)=t,(6,q)-t L (6,q). 

It is noted that the above procedure differs from the methods of the prior art. 
Conventional analytical techniques, such as PCA and SVD, simultaneously produce 
the spectral functions and the spatial functions which minimize the least-squares error 
between the original HRTFs and the reconstructed HRTFs for a given number of 
channels N. In the elaboration of the present invention, it is recognized in particular, 
that these earlier methods suffer from the following drawbacks: 

- The spatial panning functions cannot be chosen a priori. 

- The choice of error criterion to be minimized (mean squared error) enables the 
resolution of the approximation problem via tractable linear algebra. However, the 
technique does not guarantee that the model of the HRTF thus obtained is optimal 
in terms of perceived reproduction for a given number of encoding channels. 

In comparison, the technique in accordance with the present invention permits a priori 
selection of the spatial functions, from which the spectral functions are derived. As 
will be apparent from the following description, several benefits of the present 
invention will result from the possibility of choosing the panning functions a priori 
and from using a variety of techniques to derive the associated reconstruction filters. 

An immediate advantage of the invention is that the encoding format in which sounds 
are mixed in Fig. 5a is devoid of listener specific features. As discussed below, it is 
possible, without causing major degradations in reproduction fidelity, to use a 
listener-independent model of the ITD in carrying out the invention. 
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Generally, it is possible to make a selection of spatial panning functions and tune the 
reconstruction filters to achieve practical advantages such as: 

enabling improved reproduction over multi-channel loudspeaker systems, 

enabling the production of microphone recordings, 

preserving a high fidelity of reproduction in chosen directions or regions of space 
even with a low number of channels. 

Two particular choices of spatial panning functions will be detailed in this 
description: spherical harmonic functions and discrete panning functions. Practical 
methods for designing the set of reconstruction filters /,,(/) and RfJ) will be described 
in more detail. From the discussion which follows, it will be clear to a person of 
ordinary skill in the relevant art that other spatial functions can be used and that 
alternative techniques for producing the corresponding reconstruction filters are 
available. 

Delay extraction techniques 

The extraction of the interaural time delay difference, ITD(6 p , <p p ), from the HRTF pair 
L(6 p , <p p ,f) and R(6 p , q. p ,j) is performed as follows. 

Any transfer function H(J) can be uniquely decomposed into its all-pass component 
and its minimum-phase component as follows: 

Hif) = expGK/)) 
where y(J), called the excess-phase function of H(f), is defined by 

{«</) = Arg(//(/)) - Re(ffilbert(-Log|tf(/)|)). 
Applying this decomposition to the HRTFs L(6 p , <p p ,f) and R(6 p , <p p ,f), we obtain the 
corresponding excess-phase functions, y/ R (6 p , <p p , f) and y L (6 p , p p , f), and the 
corresponding minimum-phase HRTFs, L min (6 p , (f p ,j) and R min {6 p , <p pr> f). 

The interaural time delay difference, ITD(6 p , p p ), can be defined, for each direction (6 
p , <p p ), by a linear approximation of the interaural excess-phase difference: 

y R (6, <p,f) - p L (6, frf) = 2nfITD(6, <p). 
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In practice, this approximation may be replaced by various alternative methods of 
estimating the ITD, including time-domain methods such as methods using the cross- 
correlation function of the .left and right HRTFs or methods using a threshold 
detection technique to estimate an arrival time at each ear. Another possibility is to 
use a formula for modeling the variation of ITD vs. direction. For instance, 

• the spherical head model with diametrally opposite ears yields 

ITD(6, q) = r/c [ arcsin(cosO) sin(5)) + cos(^) sin(6) ], 

• the free-field model -where the ears are represented by two points separated by 
the distance 2r- yields 

ITD(6, (f) = 2r/c cos(<s) sin(6), 

where c denotes the speed of sound. In these two formulas, the value of the radius r 
can be chosen so that ITD(6 p , <p p ) is as large as possible without exceeding the value 
derived from the linear approximation of the interaural excess-phase difference. In a 
digital implementation, the value of ITD(6 p , <p p ), can be rounded to the closest integer 
number of samples, or the interaural excess-phase difference may be approximated by 
the combination of a delay unit and a digital all-pass filter. 

The delay-free HRTFs, L(6 p , <p p , j) and R(6 p , <p p , j), from which the reconstruction 
filters L,(f) and will be derived, can be identical, respectively, to the minimum- 
phase HRTF L mm (6 P , <f> P ,f) and R mi „(6 p , P P ,A 

Whatever the method used to extract or model the interaural time delay difference 
from the measured HRTF, it can be regarded as an approximation of the interaural 
excess-phase difference p R (6, <p,f) - \p L {6, q,f) by a model function p{6, (f,,f): 

It may be advantageous, in order to improve the fidelity of the 3-D audio reproduction 
according to the present invention, to correct for the error made in this phase 
difference approximation, by incorporating the residual excess-phase difference into 
the delay-free HRTFs L(6 P , p p ,j) and R(6 p , p p ,J) as follows: 

US) = L mm (J) exp(j 4, L {fj) and R{f) = R min (J) exp(j ^(/)), 
where <p L {j) and $ R {f) satisfy 
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&(/) - AV) = - vAS) - 

and either fijif) = 0 or <f R if) = 0, as appropriate to ensure that the delay- free HRTFs L{6 
p , p p ,f) and R(6 p , <p p ,f) are causal transfer functions. 

Application of spherical harmonic functions for encoding and recording 

General definition of spherical harmonics. 

Of particular interest in the following description are the zero-order harmonic W and 
the first-order harmonics X, Y and Z defined earlier, as well as the second-order 
harmonics, U and V, and the third-order harmonics, S and T, defined below. 

U(6, p) = cos 2 (f) cos(26) 
V(6, $5) = cos 2 (» sin(2<5) 
S(6, (f) = cos 3 (#s) cos(35) 
T(6, p) = cos s (p)sin(36) 
Advantages of spherical harmonics include: 

mathematically tractable, closed form -> interpolation between directions 

- mutually orthogonal 

spatial interpretation (e. g. front-back difference) 

- facilitates recording 

Fig. 6 illustrates this method in the case where the minimum-phase HRTFs are 
decomposed over spherical harmonics limited to zero and first order. The directional 
encoding of the input signal producesan 8-channel encoded signal herein referred to as 
a "Binaural B Format" encoded signal. The mixer provides for mixing of additional 
source signals, including synthesized sources. Conversely, 8 filters are used to decode 
this format into a binaural output signal. The method can be extended to include any 
or all of the above higher-order spherical harmonics. Using the higher orders provides 
for more accurate reconstruction of HRTFs, especially at high frequencies (above 3 
kHz). 

As discussed above, a Soundfield microphone produces B format encoded signals. As 
such, a Soundfield microphone can be characterized by a set of spherical harmonic 
functions. Thus from Fig. 6, it can be seen that encoding a sound in accordance with 
the invention to produce Binaural B Format encoded signals, simulates a free-field 
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recording using two Soundfield microphones located at the notional position of the 
two ears. This simulation is exact if the directional encoder provides ITD according to 
the following free-field model: 

ITD(6, p) = t s (6, <p) - t L (6, <p) = die cos(<z) sin(£), 

where d is the distance between the microphones. If the ITD model provided in the 
encoder takes into account the diffraction of sound around the head or a sphere, the 
encoded signal and the recorded signal will differ in the value of the ITD for sounds 
away from the median plane. This difference can be reduced, in practice, by adjusting 
the distance between the two microphones to be slightly larger than the distance 
between the two ears of the listener. 

The Binaural B Format recording technique is compatible with currently existing 8- 
channel digital recording technology. The recording can be decoded for reproduction 
over headphones through the bank of 8 filters LfJ) and Rff) shown on Fig. 6, or 
decoded over two or more loudspeakers using methods to be described below. Before 
decoding, additional sources can be encoded in Binaural B Format and mixed into the 
recording. 

The Binaural B Format offers the additional advantage that the set of four left or right 
channels can be used with conventional Ambisonic decoders for loudspeaker 
playback. Other advantages of using spherical harmonics as the spatial panning 
functions in carrying out the invention will be apparent in connection to multi-channel 
loudspeaker playback, offering an improved fidelity of 3-D audio reproduction 
compared to Ambisonic techniques. 

Derivation of the reconstruction filters 

For clarity, the derivation of the N reconstruction filters Lff) will be illustrated in the 
case where the spatial panning functions gf6 p , <p p ) are spherical harmonics. However, 
the methods described are general and apply regardless of the choice of spatial 
functions. 

The problem is to find, for a given frequency (or time)/ a set of complex scalars Lff) 
so that the linear combination of the spatial functions gf6 p , p p ) weighted by the Lff) 
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approximates the spatial variation of the HRTF L(6 p , p p ,J) at that frequency (or time). 
This problem can be conveniently represented by the matrix equation 

L=GL, 

where 

• the set of HRTF L(6 p , <p p , f) defines the Pxl vector L, P being the number of 
spatial directions 

• each spatial panning function <p p ) defines the Px I vector G„ and the matrix G 
is the PxN matrix whose columns are the vectors G, 

• the set of reconstruction filters £,(/) defines the jVx 1 vector of unknowns L. 

The solution which minimizes the energy of the error is given by the pseudo inversion 
L = (G T G) i G 7 L, 

where (G T G), known as the Gram matrix, is the NxN matrix formed by the dot 
products G(i, k) = G, T G k of the spatial vectors. The Gram matrix is diagonal if the 
spatial vectors are mutually orthogonal. 

Simplest case: the sampled spatial functions are mutually orthogonal => filters are 
derived by orthogonal projection of the HRTF on the individual spatial functions (dot 
product computed at each frequency). Example: 2-D reproduction with regular 
azimuth sampling. If sampled functions are not mutually orthogonal, multiply by 
inverse of Gram matrix to ensure correct reconstruction. 

Even when the panning functions g t {6, q) are mutually ortogonal, as is the case with 
spherical harmonics, the vectors G, obtained by sampling these functions may not be 
orthogonal. This happens typically if the spatial sampling is not uniform (as is often 
the case with 3-D HRTF measurements). This problem can be remedied by redefining 
the spatial dot product so as to approximate the continuous integral of the product of 
two spatial functions 

< gi> gk >= 1/(4tc) U W g,(6, <p) g k (6, (f) cos(<&) d6 dp 

by 

< g* = .... Pi g&p, <P P ) g k {6 p , <P P ) dS(p) = GJ A G k 
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where J is a diagonal PxP matrix with A(p, p) = dS(p). and dS(p) is proportional to a 
notional solid angle covered by the HRTF measured for the direction (6 p , p p ). This 
definition yields the generalized pseudo inversion equation 

L = (G J AG)' 1 G T AL, 

where the diagonal matrix A can be used as a spatial weighting function in order to 
achieve a more accurate 3-D audio reproduction in certain regions of space compared 
to others, and the modified Gram matrix (G 7 AG) ensures that the solution minimizes 
the mean squared error. 

Additional possibility: project on a subset of the chosen set of spatial functions using 
above methods. Then project the residual error over other spatial functions (cf aesl6). 
Example: to optimize fidelity of reconstruction in horizontal plane, project on W, X, 
Y first, and then project error on Z. Note that process can be iterated in more than 2 
steps. 

By combining the above techniques, it is possible, for a given set of spatial panning 
functions, to achieve control over chosen perceptual aspects of the 3-D audio 
reproduction, such as the front/back or up/down discrimination or the accuracy in 
particular regions of space. 

Fig. 7 illustrates the performance of the method for reconstructing the HRTF 
magnitude spectra in the horizontal plane = 0). For this reconstruction, only 3 
channels per ear are necessary, since the Z channel is not used. The original data are 
diffuse-field equalized HRTFs derived from measurements on a dummy head. Due to 
the limitation to first-order harmonics, the reconstruction matches the original 
magnitude spectra reasonably well up to about 2 or 3 kHz, but the performance tends 
to degrade with increasing frequency. For large-scale applications, a gentle 
degradation at high frequencies can be acceptable, since inter-individual differences in 
HRTFs typically become prominent at frequencies above 5 kHz. The frequency 
responses of the reconstruction filters obtained in this case are shown on Fig. 8. 



Adaptation of the reconstruction filters to the listener 

An advantage of a recording mad in accordance with the invention over a 
conventional two-channel dummy head recording is that, unlike prior art encoded 
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signals, binaural B format encoded signals do not contain spectral HRTF features. 
These features are only introduced at the decoding stage by the reconstruction filters 
Lff). Contrary to a conventional binaural recording, a Binaural B Format recording 
allows listener-specific adaptation at the reproduction stage, in order to reduce the 
occurrence of artifacts such as front-back reversals and in-head or elevated 
localization of frontal sound events. 

Listener-specific adaptation can be achieved even more effectively in the context of a 
real-time digital mixing system. Moreover, the technique of the present invention 
readily lends itself to a real-time mixing approach and can be conveniently 
implemented as it only involves the correction of the head radius r for the synthesis of 
ITD cues and the adaptation of the four reconstruction filters L,(f). If diffuse-field 
equalization is applied to the headphones and to the measured HRTF, and therefore to 
the reconstruction filters £,{/), the adaptation only needs to address direction- 
dependent features related to the morphology of the listener, rather than variations in 
HRTF measurement apparatus and conditions. 

Application of discrete panning functions 

Definition: functions which minimize the number of non-zero panning weights for 
any direction: 2 weights in 2D and 3 weights in 3D. For each panning function, there 
is a direction where this panning function reaches unity and is the only non-zero 
panning function. Example given in Fig. 1 for 2D case. Many variations possible. 

An advantage of discrete panning functions: fewer operations needed in encoding 
module (multiplying by panning weight and adding into the mix is only necessary for 
the encoding channels which have non-zero weights). 

The projection techniques described above can be used to derive the reconstruction 
filters. Alternatively, it can be noted that each discrete panning function covers a 
particular region of space, and admits a "principal direction" (the direction for which 
the panning weight reaches 1). Therefore, a suitable reconstruction filter can be the 
HRTF corresponding to that principal direction. This will guarantee exact 
reconstruction of the HRTF for that particular direction. Alternatively, a combination 
of the principal direction and the nearest directions can be used to derive the 
reconstruction filter. When it is desired to design a 3D audio display system which 
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offers maximum fidelity for certain directions of the sound, it is straightforward to 
design a set of panning functions which will admit these specific directions as 
principal directions. 

Methods for playback over loudspeakers 

When used in the topologies of Figs. 5a and 5b, the set of reconstruction filters 
obtained according to the present invention will provide a two-channel output signal 
suitable for high-fidelity 3D audio playback over headphones. As illustrated in Fig. 3, 
this two channel signal can be further processed through a cross-talk cancellation 
network in order to provide a two-channel signal suitable for playback over two 
loudspeakers placed in front of the listener. This technique can produce convincing 
lateral sound images over a frontal pair of loudspeakers, covering azimuths up to 
about ±120°. However, lateral sound images tend to collapse into the loudspeakers in 
response to rotations and translations of the listener's head. The technique is also less 
effective for sound events assigned to rear or elevated positions, even when the 
listener sits at the "sweet spot". 

Fig. 9 illustrates how, in the case of spherical harmonic panning functions, the 
reconstruction filters /,,(/) can be utilized to provide improved reproduction over 
multi-channel loudspeaker playback systems. An advantage of the Binaural B Format 
is that it contains information for discriminating rear sounds from frontal sounds. This 
property can be exploited in order to overcome the limitations of 2-channel transaural 
reproduction, by decoding over a 4-channel loudspeaker setup. The 4-channeI 
decoding network, shown in Fig. 9, makes use of the sum and difference of the W and 
X signals. 

The binaural signal is decomposed as follows: 

L(6, <p,J) = LF{6, (p,f) + LB(6, <p,f) 
where LF and LB are the "front" and "back" binaural signals, defined by: 

LF(6, (p,f) = 0.5 {[W(6, <p)+X(6, <fi\ [L^+LJ/)] + Y{6, <p) Ltf) + Z(6, <p) 

Lm 

LB{6, <p,f) = 0.5 {[W(6, (p)-X{6, <*)] [L^-L^] + Y(6, <p) Ltf) + Z(6, <p) 
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It can be verified that LB = 0 for (6, ft) = (0, 0) and that LF= 0 for (6, ft) = (n, 0). The 
network of Fig. 9 is designed to eliminate front-back confusions, by reproducing 
frontal sounds over the front loudspeakers and rear sounds over the rear loudspeakers, 
while elevated or lateral sounds are reproduced via both pairs of loudspeakers. This 
significantly improves the reproduction of lateral, rear or elevated sound images 
compared to a 2-channel loudspeaker setup (or to 4-channel loudspeaker reproduction 
using conventional pairwise amplitude panning or Ambisonic techniques). The 
listener is also allowed to move more freely than with 2-channel loudspeaker 
reproduction. By exploiting the Z component, a similar approach can be used to 
decode the binaural B format over a 3-D loudspeaker setup (comprising loudspeakers 
above or below the horizontal plane). 

Fig. 11 illustrates how the present invention, applied with discrete panning functions, 
can be advantageously used to provide three-dimensional audio playback over two 
loudspeakers placed in front of the listener, with cross-talk cancellation. In this 
implementation of the invention, the discrete panning functions g x {6, ft) and g 2 (6, ft) 
are chosen so that their principal directions coincide, respectively, with the directions 
of the left and right loudspeakers from the listener's head (the principal direction of 
the discrete panning function g { {6, ft) is defined as (6„ ft) verifying g { {6^ ft) =1.0 and 
gj(6 i9 ft) = 0 for j ^ I). Furthermore, the reconstruction filters and the cross-talk 
cancellation networks are free-field equalized, for each ear, with respect to the 
direction of the closest loudspeaker. As a result of these conditions, it can be verified 
that, if an audio signal is panned to the direction of one of the two loudspeakers, it is 
fed with no modification to. that loudspeaker and cancelled out from the output 
feeding the other loudspeaker. Therefore, the resulting loudspeaker playback system 
combines, in conjunction with the previously described advantages of the present 
invention, the advantage of conventional discrete panning systems and the advantages 
of binaural reproduction techniques using cross-talk cancellation. 

The following notations are used in Fig. 1 0 and Fig. 1 1 : 

• L ,/j denotes the ratio of two delay- free HRTFs: 

1^ = 1(6, ft^/L^ft,/); 

• L uj denotes the ratio of two delay-free HRTFs combined with the time difference 
between them: 
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L = exp(27ij/[ t(6„ ft) - t{6 p <pj) ]) L{6„ <p x ,f)!L{6 p <p p f). 

Fig. 11 illustrates how the decoder of Fig. 10 can be modified to offer further 
improved three-dimensional audio reproduction over four loudspeakers arranged in a 
front pair and a rear pair. The method used is similar to the method used in the system 
of Fig. 9, in that a front cross-talk canceller and a rear cross-talk canceller are used, 
and they receive different combinations of the left and right encoded signals. These 
combinations are designed so that frontal sounds are reproduced over the front 
loudspeakers and rear sounds are reproduced over the rear loudspeakers, while 
elevated or lateral sounds are reproduced via both pairs of loudspeakers. Fig. 11 
shows an embodiment of the present invention using 6 encoding channel for each ear, 
where channels 1 and 2 are front left and right channels, channels 5 and 4 are rear left 
and right channels, and channels 3 and 6 are lateral and/or elevated channels. A 
particular advantageous property of this embodiment is that, if an audio signal is 
panned towards the direction of one of the four loudspeakers (corresponding to the 
principal direction of one of the channels 1, 2, 4, or 5), it is fed with no modification 
to that loudspeaker and cancelled out from the output feeding the three other 
loudspeakers. It is noted that, generally, the systems of Fig. 10 or Fig. 11 can be 
extended to include larger numbers of encoding channels without departing from the 
principles characterizing the present invention, and that, among these encoding 
channels, one or more can have their principal direction outside of the horizontal 
plane so as to provide the reproduction of elevated sounds or of sounds located below 
the horizontal plane. 
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What is claimed is: 

1 . A method for positioning of an audio signal comprising steps of: 
selecting a set of spatial functions, each having an associated scaling factor; 
providing a first set of amplifiers and a second set of amplifiers, the gains of 

the amplifiers being a function of the scaling factors; 
receiving a first audio signal; 

providing a direction representing the direction of the source of the first audio 

signal; 

adjusting the scaling factors depending on the direction; 
applying the first set of amplifiers to the first audio signal to produce first 
encoded signals; 

delaying the first audio signal to produce a delayed audio signal; and 
applying the second set of amplifiers to the delayed audio signal to produce 
second encoded signals. 

2. The method of claim 1 wherein the spatial functions are spherical harmonic 
functions. 

3. The method of claim 2 wherein the spherical harmonic functions include at 
least the first-order harmonics. 

4. The method of claim 1 wherein the spatial functions are discrete panning 
functions. 

5. The method of claim 1 wherein for each of the first and second sets of 
amplifiers, the gain of each amplifier is based on the B-format encoding scheme. 

6. The method of claim 1 further including: 

providing a third set of amplifiers and a fourth set of amplifiers, the gains of 
the amplifiers being a function of the scaling factors; 
receiving a second audio signal; 
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providing a direction representing the direction of the source of the second 
audio signal; 

adjusting the scaling factors depending on the direction; 
applying the third set of amplifiers to the second audio signal to produce third 
encoded signals; 

delaying the second audio signal to produce a second delayed audio signal; 

applying the fourth set of amplifiers to the second delayed audio signal to 
produce fourth encoded signals; 

mixing the first and the third encoded signals, or the first and the fourth 
encoded signals; and 

mixing the second and the fourth encoded signals, or the second and the third 
encoded signals. 

7. The method of claim 6 wherein the second signal is a synthesized audio signal. 

8. The method of claim 1 further including a decoding the encoded signals, the 
decoder comprising filters defined based on the spatial functions. 

9. An audio recording apparatus for directionally encoding an audio signal 
comprising: 

a source of an audio signal, the audio signal having a time- varying direction 
associated therewith; 

a first set of multiplier circuits, each having a gain factor adaptable according 
to a direction for the audio signal, each having an input to receive the audio source, 
each having an output; 

a delay element having an input coupled to the audio source and having an 
output; and 

a second set of multiplier circuits, each having a gain factor adaptable 
according to a direction for the audio signal, each having an input to receive the 
output of the delay element, each having an output; 

whereby the outputs of the first and second multiplier circuits comprise 
encoded audio signals. 
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10. The apparatus of claim 9 wherein the source includes a source of a synthesized 
audio signal. 

1 1 . The apparatus of claim 9 wherein the gain factors of the first and second 
multiplier circuits are based on spherical harmonic functions. 

12. The apparatus of claim 1 1 wherein the spherical harmonic functions include at 
least zero- and first-order harmonics. 

13. The apparatus of claim 9 wherein the gain factors of the first and second 
multiplier circuits are based on discrete panning functions. 

14. The apparatus of claim 9 further including a data storage device having an 
interface effective for receiving and storing the outputs of the multiplier circuits. 

15. A 3-dimensional audio recording system comprising: 

a first soundfield microphone to produce first directionally encoded audio 
signals; and 

a second soundfield microphone to produce second directionally encoded 
audio signals; 

the first and second soundfield microphones are proximate each other at the 
positions of the ears of a notional listener; 

wherein the first and second directionally encoded audio signals represent a 3- 
dimensional audio recording. 

16. The system of claim 1 5 further including a storage device for storing the first 
and second directionally encoded audio signals. 

17. The system of claim 16 further including A/D circuitry for converting outputs 
of the microphones to digital signals, whereby the digital signals can be stored on the 
storage device. 
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18. The system of claim 15 wherein the first and second microphones are spaced 
apart by a distance substantially equal to the width of a human head. 

19. The system of claim 15 wherein the first and second soundfield microphones 
are characterized by a set of spatial functions, the system further including a decoder 
for receiving the first and second directionally encoded signals to produce an audio 
signal, the decoder comprising filters defined based on the spatial functions. 

20. A method of producing an audio signal from directionally encoded audio 
signals comprising steps of: 

receiving directionally encoded audio signals according to a set of spatial 
functions; 

generating a set of spectral functions based on the spatial functions; 

providing a first set of decoding filters defined by left spectral functions; 

providing a second set of decoding filters defined by right spectral functions; 

applying the first decoding filters to the encoded audio signals to produce a 
left-channel audio signal; and 

applying the second decoding filters to the encoded audio signals to produce a 
right-channel audio signal. 

21 . The method of claim 20 wherein the set of spatial functions is defined by 
{g,(&, <p), i = 0, 1, ... AM} and the step of generating the spectral functions includes 
providing Iff) and Rtf) such that N . u g,{0 p , <p p ) Ltf) approximates L(0 p , <p p ,j) 
and Z„- = <>.... N -i> g.(.0p> <P P ) R t (f) approximates R(0 p , <p p ,j), where L(0 p , <p p ,j) is a set of 
left-ear HRTFs and R(0 p , <p p ,f) is a set of right-ear HRTFs, where {(0 <p p ),p= 1, 2, 
... P] is a set of directions and /is frequency. 

22. The method of claim 21 wherein L(0 p , <p p ,j) and R(0 p , <p p ,f) are delay-free 
HRTFs. 

23. The method of claim 21 wherein providing L/J) includes solving, at each 
frequency f, the vector equation L = GL„ where: 
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the set of left-ear HRTFs L(& p , <p pt f) define a Px 1 vector L, 
G is a Px.N matrix whose columns are Px 1 vectors G„ i = 0, 1 , . . . N- 1 
each of the N spatial functions g t { 9 p , <p p ) defines the vector G„ and 
the set of L,(J) defines the Nx 1 vector L. 

24. The method of claim 23 wherein providing L,(f) is obtained by L = (G T G)~ 
{ G T L. 

25. The method of claim 24 wherein providing Lff) includes projecting a Px 1 
vector L formed by the set of left-ear HRTFs L{0 p , <p p ,f) over each of Px 1 vectors G, 
formed by the spatial functions g,{0 p , <p p ) to compute the scalar product L t . 

26. The method according to claim 25 wherein an Nx 1 vector L formed by the 
scalar products L f is multiplied by the inverse of the Gram matrix G T G. 

27. The method of claim 23 wherein providing L0 is obtained by L = (G r A G)~ 
l G r AL J where A is a diagonal PxP matrix where the P diagonal elements are weights 
applied to the individual directions (& p , <p p \p = 1, 2, ... P. 

28. The method of claim 20 where each weight is proportional to a solid angle 
associated with the corresponding direction. 

29. The method of claim 28 wherein the spatial functions are spherical harmonic 
functions. 

30. The method of claim 2 i wherein the spherical harmonic functions include at 
least zero- and first-order harmonics. 

3 1 . The method of claim 20 wherein the spectral functions define filters L^O, 
Lx(f), Ly(f), and L-^j), effective for decoding B-format encoded signals W u Y h Z h W R , 
X R , Y R , and Z R , wherein the left-channel audio signal is defined by W L L^f) + X L L^J) + 
Y L Ly(f) + Z L Lz(f) and the right-channel audio signal is defined by W R L iV (f) + X R Lyff) - 
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Y R Ly(f) + Z R Lzff)', whereby the left- and right-channel audio signals are suitable for 
playback with headphones. 

32. The method of claim 20 wherein the spectral functions define filters L iV (f), 
Lx0) r LyO). and L z (f) effective for decoding B-format encoded signals W D X h Y h Z u W R , 
X R , Y R , and Z R ; wherein the left-channel audio signal comprises two signals 

a first signal LF = 0.5 {[W L +XJ [L^® -fLJf)} + Y L LJf) +Z L L^} and 
a second signal LB = 0.5 {fW L -XJ [LJj) -L x {f)] + Y L Ly(f) +Z L LJf)}\ 

and wherein the right-channel audio signal comprises two signals 

a first signal RF = 0.5 {[W R +XjJ [L^^L^J + Y R Ly(f) +Z R L^} and 
a second signal KB = 0.5 {[W^XJ [L^-L^f)] - Y R LJf) +Z R L^j; 

whereby the left- and right- channel audio signals are suitable for playback over a pair 

of front speakers and a pair of rear speakers. 

33. The method of claim 32 further including: 

performing a first cross-talk cancellation on the LF and RF signals to feed the 
front speakers; and 

performing a second cross-talk cancellation on the LB and RB signals to feed 
the rear speakers. 

34. The method of claim 20 wherein the spatial functions are discrete panning 
functions having a direction, called a principal direction, where the spatial function is 
maximum and wherein all other spatial functions are zero. 

35. The method of claim 34 wherein the spectral function associated with each 
spatial function is the delay-free HRTF for the corresponding principal direction. 

36. The method according to claims 34 or 35 wherein one or more of the spatial 
functions have their principal direction corresponding to the direction of one of the 
loudspeakers. 

37. The method according to claims 33 or 36 including performing cross-talk 
cancellation of the left and right audio signals before feeding the loudspeakers. 
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38. The method of claims 34 or 35 further including: 

producing left-front and left-back signals based on the left-channel audio 

signal; 

producing right-front and right-back signals based on the right-channel audio 
signal; and 

combining the left-front, left-back, right-front, and right-back signals to 
produce outputs suitable for playback with a pair of front speakers and a pair of rear 
speakers. 

39. The method of claim 38 further including: 

performing a first cross-talk cancellation on the left-front and right- front 
signals to feed the front speakers; and 

performing a second cross-talk cancellation on the left-back and right-back 
signals to feed the rear speakers. 

40. The method of claim 39 wherein one or more of the spatial functions have 
their principal direction corresponding to the direction of the loudspeakers. 

41 . A method for reproducing an audio scene comprising: 
selecting set of spatial functions; 

producing directionally encoded audio signals including receiving a first audio 
source and applying the spatial functions to the first audio source to produce first 
encoded signals; and 

decoding the encoded audio signals, including generating spectral functions 
based on the first spatial functions and applying the spectral functions to the encoded 
audio signals. 

42. The method of claim 41 further including delaying the first audio source to 
produce a delayed source, applying the spatial functions to the delayed source to 
produce second encoded signals, the first and second signals comprising directionally 
encoded audio signals. 
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43. The method of claim 41 wherein the step of producing directionally encoded 
audio signals further includes receiving a second audio source, applying the spatial 
functions to the second audio source to produce second encoded signals, and mixing 
the first and second encoded signals. 

44. The method of claim 43 wherein the second audio source is a synthesized 
audio signal. 

45. The method of claim 41 wherein the spatial functions are spherical harmonic 
functions. 

46. The method of claims 45 wherein the spherical harmonic functions include at 
least zero- and first-order harmonics. 

47. The method of claim 41 wherein the spatial functions are discrete panning 
functions. 

48. The method of claim 41 wherein the step of applying the spectral functions to 
the directionally encoded audio signals includes providing a set of filters defined by 
the spectral functions and feeding the encoded audio signals into the filters to produce 
reconstructed audio signals. 

49. The method of claim 41 further including performing a cross-talk cancellation 
operation on the reconstructed audio signals to produce output suitable for playback 
with speakers. 
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